Model Fitting

Given that we could just ask the simulation for more samples, I decided to NOT employ a typical training-split. Ratther, I will deploy all samples for training and optimize based on cross validation approaches, given every test record a turn at being in the training set and test set. Once the model is fitted, we can then ask the simulation for some more samples (say 1000) to use as a test set completely related here.

Model Evaluations

Many metrics can be used to evaluate models, some I calculate here are:

  1. Accuracy: TP + TN / total, this is the number of samples the RF model classifiers correctly
  2. Error Rate: 1 - Accuracy, this is the number of samples the RF model classifiers incorrectly
  3. True Positive Rate (TPR) | Sensitivity | Recall | Coverage: TP / (TP + FN), fraction of SAS examples correctly predicted
    • There's a typically a trade off between Recall and Precision below
  4. True Negative Rate (TNR) | Specificity: TN / (FP + TN), fraction of neutral examples correctly predicted
  5. False Positive Rate (FPR): FP / (FP + TN) fraction of neutral examples predicted as having SAS (really bad)
  6. False Negative Rate (FNR): FN / (TP + FN) fraction of SAS examples predicted as having SAS (not as bad, but still bad)
  7. Precision: TP / (TP + FP), fraction of samples that actually have SAS out of total samples predicted to have SAS
    • Precision addresses the question: "Given a sample predicted to have SAS, how likely is it to be correct?"
    • We may want to sacrifice Recall in order to archieve a high precision
  8. F-measure: $\frac{2*precision*recall}{precision+recall}$, a harmonic mean (meaning the resulting metric is *closer to the smaller of the input, that is, F-measure is closer to either precision or recall, whichever is smaller in magnitude) of precision and recall
    • Ideally, a F-measure should be high and indicate that both precision and recall are high
  9. Area under the Curve (AUC): this is the area of the curve under the Receiver operating characteristic (ROC) curve, which demonstrate the specificity-sensitivity trade-off (TPR vs TNR)

Confusion Matrix

Alternative Metric: Cost Matrix

Sometimes, in the realm of health sciences, we want to punish or reward the model for doing something really well compared to other. For example, in cancer prediction, more emphasis is implaced on avoiding False Negatives (not detecting the cancer), so that we may wish to assign costs/weights (negative means reward) to TP, FP, TN, and FN like this:

This can be implemented as other alternative to above metrics during cross validation. Or, cost matrix can be used to classify one particular record. That is, we can use cost matrix to evaluate risk

With a RandomForest, I am able to extract the probability of a sample as showing SAS or not, then say:

P(SAS) = 0.2
P(neutral, other) = 0.8

Given the above cost matrix, then when I:

Test Set Evaluation

Below are peformance measures on the test set never touched during model building

Issues

Can try a variety of bin values, but all results in low cross validated and test accuracy...

Reason: assumption that SAS samples will have 'same regions that light up' not exactly true;

From the heatmaps above, SAS samples do NOT seem much different compared to Non-SAS samples